ToNER: A tool for identifying nucleotide enrichment signals in feature-enriched RNA-seq data

نویسندگان

Yuttachon Promworn

Pavita Kaewprommal

Philip J Shaw

Apichart Intarapanich

Sissades Tongsima

Jittima Piriyapongsa

چکیده

BACKGROUND Biochemical methods are available for enriching 5' ends of RNAs in prokaryotes, which are employed in the differential RNA-seq (dRNA-seq) and the more recent Cappable-seq protocols. Computational methods are needed to locate RNA 5' ends from these data by statistical analysis of the enrichment. Although statistical-based analysis methods have been developed for dRNA-seq, they may not be suitable for Cappable-seq data. The more efficient enrichment method employed in Cappable-seq compared with dRNA-seq could affect data distribution and thus algorithm performance. RESULTS We present Transformation of Nucleotide Enrichment Ratios (ToNER), a tool for statistical modeling of enrichment from RNA-seq data obtained from enriched and unenriched libraries. The tool calculates nucleotide enrichment scores and determines the global transformation for fitting to the normal distribution using the Box-Cox procedure. From the transformed distribution, sites of significant enrichment are identified. To increase power of detection, meta-analysis across experimental replicates is offered. We tested the tool on Cappable-seq and dRNA-seq data for identifying Escherichia coli transcript 5' ends and compared the results with those from the TSSAR tool, which is designed for analyzing dRNA-seq data. When combining results across Cappable-seq replicates, ToNER detects more known transcript 5' ends than TSSAR. In general, the transcript 5' ends detected by ToNER but not TSSAR occur in regions which cannot be locally modeled by TSSAR. CONCLUSION ToNER uses a simple yet robust statistical modeling approach, which can be used for detecting RNA 5'ends from Cappable-seq data, in particular when combining information from experimental replicates. The ToNER tool could potentially be applied for analyzing other RNA-seq datasets in which enrichment for other structural features of RNA is employed. The program is freely available for download at ToNER webpage (http://www4a.biotec.or.th/GI/tools/toner) and GitHub repository (https://github.com/PavitaKae/ToNER).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A clustering approach for identification of enriched domains from histone modification ChIP-Seq data

MOTIVATION Chromatin states are the key to gene regulation and cell identity. Chromatin immunoprecipitation (ChIP) coupled with high-throughput sequencing (ChIP-Seq) is increasingly being used to map epigenetic states across genomes of diverse species. Chromatin modification profiles are frequently noisy and diffuse, spanning regions ranging from several nucleosomes to large domains of multiple...

متن کامل

iREAD: A Tool for Intron Retention Detection from RNA-seq data

Summary: Detecting intron retention (IR) events is emerging as a specialized need for RNA-seq data analysis. Here we present iREAD (intron REtention Analysis and Detector), a tool to detect IR events genome-wide from high-throughput RNA-seq data. The command line interface for iREAD is implemented in Python. iREAD takes as input an existing BAM file, representing the transcriptome, and a text f...

متن کامل

Decoding ChIP-seq with a double-binding signal refines binding peaks to single-nucleotides and predicts cooperative interaction.

The comprehension of protein and DNA binding in vivo is essential to understand gene regulation. Chromatin immunoprecipitation followed by sequencing (ChIP-seq) provides a global map of the regulatory binding network. Most ChIP-seq analysis tools focus on identifying binding regions from coverage enrichment. However, less work has been performed to infer the physical and regulatory details insi...

متن کامل

A Robust Analytical Pipeline for Genome-Wide Identification of the Genes Regulated by a Transcription Factor: Combinatorial Analysis Performed Using gSELEX-Seq and RNA-Seq

For identifying the genes that are regulated by a transcription factor (TF), we have established an analytical pipeline that combines genomic systematic evolution of ligands by exponential enrichment (gSELEX)-Seq and RNA-Seq. Here, SELEX was used to select DNA fragments from an Aspergillus nidulans genomic library that bound specifically to AmyR, a TF from A. nidulans. High-throughput sequencin...

متن کامل

Grouped False-Discovery Rate for Removing the Gene-Set-Level Bias of RNA-seq

In recent years, RNA-seq has become a very competitive alternative to microarrays. In RNA-seq experiments, the expected read count for a gene is proportional to its expression level multiplied by its transcript length. Even when two genes are expressed at the same level, differences in length will yield differing numbers of total reads. The characteristics of these RNA-seq experiments create a ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 12 شماره

صفحات -

تاریخ انتشار 2017

ToNER: A tool for identifying nucleotide enrichment signals in feature-enriched RNA-seq data

نویسندگان

چکیده

منابع مشابه

A clustering approach for identification of enriched domains from histone modification ChIP-Seq data

iREAD: A Tool for Intron Retention Detection from RNA-seq data

Decoding ChIP-seq with a double-binding signal refines binding peaks to single-nucleotides and predicts cooperative interaction.

A Robust Analytical Pipeline for Genome-Wide Identification of the Genes Regulated by a Transcription Factor: Combinatorial Analysis Performed Using gSELEX-Seq and RNA-Seq

Grouped False-Discovery Rate for Removing the Gene-Set-Level Bias of RNA-seq

عنوان ژورنال:

اشتراک گذاری